Variational Bayesian Approach to Movie Rating Prediction

نویسندگان

  • Yew Jin Lim
  • Yee Whye Teh
چکیده

Singular value decomposition (SVD) is a matrix decomposition algorithm that returns the optimal (in the sense of squared error) low-rank decomposition of a matrix. SVD has found widespread use across a variety of machine learning applications, where its output is interpreted as compact and informative representations of data. The Netflix Prize challenge, and collaborative filtering in general, is an ideal application for SVD, since the data is a matrix of ratings given by users to movies. It is thus not surprising to observe that most currently successful teams use SVD, either with an extension, or to interpolate with results returned by other algorithms. Unfortunately SVD can easily overfit due to the extreme data sparsity of the matrix in the Netflix Prize challenge, and care must be taken to regularize properly. In this paper, we propose a Bayesian approach to alleviate overfitting in SVD, where priors are introduced and all parameters are integrated out using variational inference. We show experimentally that this gives significantly improved results over vanilla SVD. For truncated SVDs of rank 5, 10, 20, and 30, our proposed Bayesian approach achieves 2.2% improvement over a näıve approach, 1.6% improvement over a gradient descent approach dealing with unobserved entries properly, and 0.9% improvement over a maximum a posteriori (MAP) approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Variational Bayes and Gibbs Sampling in Reconstruction of Missing Values with Probabilistic Principal Component Analysis

Lately there has been the interest of categorization and pattern detection in large data sets, including the recovering of the dataset missing values. In this project the objective will be to recover the subset of missing values as accurately as possible from a movie rating data set. Initially the data matrix is preprocessed and its elements are divided in training and test sets. Thereafter the...

متن کامل

Movie Rating Prediction

The Internet Movie Database (IMDB) is one of the largest online resources for general movie information combined with a forum in which users can rate movies. We investigate the extent to which a movie’s average user rating can be predicted after learning the relationship between the rating and a movie’s various attributes from a training set. Two methods are evaluated: kernel regression and mod...

متن کامل

Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models

In this paper, we describe a general variational Bayesian approach for approximate inference on nonlinear stochastic dynamic models. This scheme extends established approximate inference on hidden-states to cover: (i) nonlinear evolution and observation functions, (ii) unknown parameters and (precision) hyperparameters and (iii) model comparison and prediction under uncertainty. Model identific...

متن کامل

Who Rated What: a combination of SVD, correlation and frequent sequence mining

KDD Cup 2007 focuses on predicting aspects of movie rating behavior. We present our prediction method for Task 1 “Who Rated What in 2006” where the task is to predict which users rated which movies in 2006. We use the combination of the following predictors, listed in the order of their efficiency in the prediction: • The predicted number of ratings for each movie based on time series predictio...

متن کامل

Application of Variational Bayesian Approach to Speech Recognition

In this paper, we propose a Bayesian framework, which constructs shared-state triphone HMMs based on a variational Bayesian approach, and recognizes speech based on the Bayesian prediction classification; variational Bayesian estimation and clustering for speech recognition (VBEC). An appropriate model structure with high recognition performance can be found within a VBEC framework. Unlike conv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007